Search CORE

36 research outputs found

Multi-task Pairwise Neural Ranking for Hashtag Segmentation

Author: Maddela Mounica
Preoţiuc-Pietro Daniel
Xu Wei
Publication venue
Publication date: 01/01/2019
Field of study

Hashtags are often employed on social media and beyond to add metadata to a textual utterance with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.Comment: 12 pages, ACL 201

arXiv.org e-Print Archive

Crossref

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks

Author: Aletras Nikolaos
Preoţiuc-Pietro Daniel
Villegas Danae Sánchez
Publication venue
Publication date: 14/09/2023
Field of study

Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection and hate speech classification. However, combining text and image information is challenging because of the idiosyncratic cross-modal semantics with hidden or complementary information present in matching image-text pairs. In this work, we aim to directly model this by proposing the use of two auxiliary losses jointly with the main task when fine-tuning any pre-trained multimodal model. Image-Text Contrastive (ITC) brings image-text representations of a post closer together and separates them from different posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates the understanding of semantic correspondence between images and text by penalizing unrelated pairs. We combine these objectives with five multimodal models, demonstrating consistent improvements across four popular social media datasets. Furthermore, through detailed analysis, we shed light on the specific scenarios and cases where each auxiliary task proves to be most effective

arXiv.org e-Print Archive

Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words

Author: Daniel Preoţiuc-Pietro
Eugen Ruppert
Lucie Flekova
Publication venue
Publication date: 10/04/2020
Field of study

Abstract Contemporary sentiment analysis approaches rely heavily on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced contextaware method. Our method enhances the assessment of lexicon based sentiment detection algorithms and can be further used to quantify ambiguous words

CiteSeerX

Sentiment analysis with genetically evolved Gaussian kernels

Author: Beck Daniel
Beck Daniel
Blum Manuel
Cohn Trevor
Duvenaud David
Fortin Félix-Antoine
Iqbal M.
Koza John R.
Preoţiuc-Pietro Daniel
Santana R.
Shaffer Juliet Popper
Shah Kashif
Specia Lucia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Sentiment analysis consists of evaluating opinions or statements based on text analysis. Among the methods used to estimate the degree to which a text expresses a certain sentiment are those based on Gaussian Processes. However, traditional Gaussian Processes methods use a prede- fined kernels with hyperparameters that can be tuned but whose structure can not be adapted. In this paper, we propose the application of Genetic Programming for the evolution of Gaussian Process kernels that are more precise for sentiment analysis. We use use a very flexible representation of kernels combined with a multi-objective approach that considers si- multaneously two quality metrics and the computational time required to evaluate those kernels. Our results show that the algorithm can outper- form Gaussian Processes with traditional kernels for some of the sentiment analysis tasks considered

arXiv.org e-Print Archive

Crossref

BCAM's Institutional Repository Data

Studying user income through language, behaviour and affect in social media

Author: AJ Smola
B Bernstein
B Bernstein
BW Roberts
CE Rasmussen
D Freedman
D Kahneman
D Preoţiuc-Pietro
D Rout
Daniel Preoţiuc-Pietro
DM Blei
E Diener
E Snelson
F Pedregosa
FD Blau
H Zou
HA Schwartz
HB Mann
J Bollen
J Cohen
J Eisenstein
L Sloan
Lidia Adriana Braunstein
Nikolaos Aletras
P Ekman
P Elias
RM Neal
Svitlana Volkova
TA Judge
V Lampos
Vasileios Lampos
VN Vapnik
W Ng
W Youyou
Yoram Bachrach
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/09/2015
Field of study

Automatically inferring user demographics from social media posts is useful for both social science research and a range of downstream applications in marketing and politics. We present the first extensive study where user behaviour on Twitter is used to build a predictive model of income. We apply non-linear methods for regression, i.e. Gaussian Processes, achieving strong correlation between predicted and actual user income. This allows us to shed light on the factors that characterise income on Twitter and analyse their interplay with user emotions and sentiment, perceived psycho-demographics and language use expressed through the topics of their posts. Our analysis uncovers correlations between different feature categories and income, some of which reflect common belief e.g. higher perceived education and intelligence indicates higher earnings, known differences e.g. gender and age differences, however, others show novel findings e.g. higher income users express more fear and anger, whereas lower income users express more of the time emotion and opinions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

White Rose Research Online

Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words

Author: Daniel Preoţiuc-Pietro
Eugen Ruppert
Lucie Flekova
Publication venue
Publication date: 10/04/2020
Field of study

CiteSeerX

Analysing Domain Suitability of a Sentiment Lexicon by Identifying Distributionally Bipolar Words

Author: Flekova Lucie
Preoţiuc-Pietro Daniel
Ruppert Eugen
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/09/2015
Field of study

TUbiblio

Exploring Stylistic Variation with Age and Income on Twitter

Author: Flekova Lucie
Preoţiuc-Pietro Daniel
Ungar Lyle
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/08/2016
Field of study

TUbiblio

Combining Humor and Sarcasm for Improving Political Parody Detection

Author: Aletras Nikolaos
Ao Xiao
Preoţiuc-Pietro Daniel
Villegas Danae Sánchez
Publication venue
Publication date: 06/05/2022
Field of study

Parody is a figurative device used for mimicking entities for comedic or critical purposes. Parody is intentionally humorous and often involves sarcasm. This paper explores jointly modelling these figurative tropes with the goal of improving performance of political parody detection in tweets. To this end, we present a multi-encoder model that combines three parallel encoders to enrich parody-specific representations with humor and sarcasm information. Experiments on a publicly available data set of political parody tweets demonstrate that our approach outperforms previous state-of-the-art methods.Comment: Accepted at NAACL 202

arXiv.org e-Print Archive